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Abstract 

Cell differentiation in multicellular organisms is a complex process whose mechanism 
can be understood by a reductionist approach, in which the individual processes that 
control the generation of different cell types are identified. Alternatively, a large scale 
approach in search of different organizational features of the growth stages promises 
to reveal its modular global structure with the goal of discovering previously unknown 
relations between cell types. Here we sort and analyze a large set of scattered data 
to construct the network of human cell differentiation (NHCD) based on cell types 
(nodes) and differentiation steps (links) from the fertilized egg to a crying baby. We 
discover a dynamical law of critical branching, which reveals a fractal regularity in 
the modular organization of the network, and allows us to observe the network at 
different scales. The emerging picture clearly identifies clusters of cell types following 
a hierarchical organization, ranging from sub-modules to super-modules of special- 
ized tissues and organs on varying scales. This discovery will allow one to treat the 
development of a particular cell function in the context of the complex network of 
human development as a whole. Our results point to an integrated large-scale view of 
the network of cell types systematically revealing ties between previously unrelated 
domains in organ functions. 
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The cell differentiation process plays a crucial role in the prenatal development of multi- 
cellular organisms. Recent advances in the research on stem cell properties and embryonic 
development have uncovered several steps in the differentiation process lM8|. Single and 
multiple sequences of cell differentiation have been identified through in-vivo observations 
of a particular embryo during early stages of development and through pathology studies of 
miscarriages during late stages of the process. While the identification of each cell differ- 
entiation step has been the subject of intense research, an integrated view of this complex 
process is still missing. Such a global view promises to reveal features associated with the 



large-scale modular organization of the cell types |Jl7|, |9Hl3| with the purpose of discover- 



ing new functional modu 



community detection 



es between cell types using novel theoretical network analysis for 



10Nl2|. In this letter, we take advantage of the current knowledge on 



the sequence of cell differentiation processes, which is spread over a vast specialized litera- 
ture [lIM Il4|-|28| (see the Supplementary Information SI- Table [J and references therein), to 



reveal and characterize the topological and dynamical features associated with the network 
of human cell differentiation (NHCD). 

I. RESULTS 



We construct the NHCD by systematically gathering the scattered information on the 
evolution of each cell type present in the embryo and fetus from a predecessor with a higher 
degree of differentiation potential into a more specialized type. The process of cell differ- 
entiation is then mapped onto a complex network which consists of 873 nodes connected 
through 977 edges. The nodes in the network represent distinct cell types reported in the lit- 



erature m 



[l-6 



14j-|28| and the edges represent the association between two cell types through 



a differentiation event. 

The initial steps of the NHCD are shown in the inset of Fig. HJ while the resulting network 
structure is shown in the main panel of Fig. [I] (see also Si-Figs. |5^ and[5b). The fertilized egg 
is followed by the ball stage, and the formation of the primary germ cell layers. Currently, it 
is known that until the ball stage, cell division is symmetric and produces further totipotent 
stem cells |l| . These cells then give rise to all the differentiated tissues of the organism as well 
as the extra-embryonic tissues (placenta, umbilical cord, etc.). Moreover, in the course of 
the entire process of organism formation, there is a monotonic decrease in the differentiation 
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potential (totipotent — > pluripotent — > multipotent — > unipotent cells) accompanied with 
an increase in cell specialization. 

Certain types of cells can be generated following more than one path from the fertilized 
egg. This process generates some closed loops of edges in the network. The NHCD comprises 
529 branches of different lengths with each branch ending when the cell types do not undergo 
further differentiation. Note, however, that the most recent compilation of cell types in 
normal, healthy, human adults done in 6] reports only 407 cell types. Therefore, not all 
branch endpoints correspond to cell types in born humans. Thus, not all 873 cell types are 
present in a human being. Among those absent are the placenta cells that are generated 
from the fertilized egg during embryo development, as well as other somatic cell types that 
are important to control embryo and fetus development. The cell types that survive in 
a human are denoted by filled circles, while non-surviving ones are indicated by empty 
circles. The complete collected data is listed in the Supplementary Information, including 
an enumeration of cells and links between the cell types, their time of appearance in days 
after fecundation (T a ), and the reference to the publications reporting each link. To the 
best of our knowledge, the structure identified here provides the most complete schematic 
diagram of the human differentiation process to date. 

It is visually apparent from Fig. [1] that the NHCD has a prominent modular structure. 
The continuous differentiation of cells into more specialized functions naturally leads to 
the formation of dense isolated clusters in the NHCD. As a first approach to understand 
this modular structure we cluster cell types in the network of Fig. [T] according to their 
known functions; different colors indicate 19 functional modules extracted from the literature 
(CI — CT9) (See SI- Table [I] and references therein. The largest communities were extracted 
from Refs. |ll-l6l.ll4|-|28|). There is, however, a certain degree of arbitrariness in this modular 
structure as the separation of the nodes into communities in our dataset is not unique. For 
instance, community C12, the neural lineage, could be divided into two sub-communities, 
representing the neural and the supporting (glial) cells [3-0, , Q. On the other 

hand, the neural system module could be merged with the eye system module [l|, 5], 0, Q 
on a larger scale, since they have a common ancestral cell type. Therefore, a finer or coarser 
community structure can be extracted from the data when we look at the whole network 
at different scales of observation; a novel module-detection algorithm is needed to identify 
these communities in a systematic way. 
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Graph theoretical concepts allow us to unravel the scale dependence of the modular struc- 
ture of the NHCD. Graph theory [12[ defines the distance between two nodes (also called 
the chemical distance) as the number of links along the shortest path between the nodes 
in the network. We use this notion to propose a community detection algorithm that iden- 
tifies modules of size I composed of highly connected cell types. The algorithm finds the 
optimal tiling of the network with the smallest possible number of modules. Nb, of size 



t 13] (each node is assigned to a module or box and all nodes in a module are at dis- 
tance smaller than £). This process results in an optimization problem which can be solved 
using the box-covering algorithm explained in Figj2|i, Materials and Methods Section III II 



and reported in 29[ as the Maximum Excluded Mass Burning algorithm (MEMB, the al- 



gorithm can be downloaded from http: //lev. ccny . cuny . edu/~hmakse/ soft_data.html ). 



The requirement of minimal number of modules to cover the network (Nb) guarantees that 
the partition of the network is such that each module contains the largest possible number 
of nodes and links inside the module with the constraint that the modules cannot exceed 



size I. This optimized tiling process gives rise to modules with the fewest number of 
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inks 



connecting to other modules implying that the degree of modularity, defined by [10 

1 N B j in 

M^gxp, (i) 

is maximized. Here L™ and L° ut represent the number of links that start in a given module 
i and end either within or outside i, respectively. Large values of M. (L° nt — > 0) correspond 
to a higher degree of modularity. The value of the modularity of the network Ai varies 
with £, so that we can detect the dependence of modularity on different length scales, or 
equivalently how the modules themselves are organized into larger modules that enhance 
the degree of modularity. 

For a given £, we obtain the optimal coverage of the network with Nb modules (we use 



the MEMB algorithm 29] explained in Fig. [2^ and Materials and Methods). Analysis of 
the modularity Eq. ([I]) in Fig. [3^ reveals a monotonic increase of Ai(£) with a lack of a 
characteristic value of £. Indeed, the data can be approximately fitted with a power-law 
functional form: 

M(£)~£ dM , (2) 

which is detected through the modularity exponent du- We characterize the network using 
different snapshots in time and we find that du — 2.0 is approximately constant over the 



time evolution (Fig. [3^). This value reveals a considerable degree of modularity in the 
entire system (for comparison, a random network has &m = and a uniform lattice has 
du = 1 0l)> as evidenced by the network structure in Fig. [TJ The lack of a characteristic 
length-scale in the modularity shown in Fig. [3^ suggests that the modules appear at all 
length-scales, i.e. modules are organized within larger modules in a self-similar way, so that 
the inter-connections between those clusters repeat the basic modular character of the entire 
NHCD. Thus, the NHCD remains statistically invariant when observed at different scales. 
Varying the module size £ yields the scaling relation for the number of modules (Fig. [3b): 

N B (e)~r dB , (3) 



where ds represents the fractal dimension of the network 13]. We find that the fractal 
character of the modules is established at the early stages, yielding ds — 1-4 as early as 30 
days (Fig. |3b). As the network evolves, the fractal dimension increases slightly and finally 
reaches ds — 1.9. 

The significance of Eq. (T5]) is that the modules need to be interpreted at a given length- 
scale. Figure [2b shows an example of such hierarchical organization [Fig. [2fc and Si-Fig. [6] 
show the full modular structure, while a list of detected modules appears in the SI Appendix]. 
Three types of communities of cell types are clearly identified in Fig. Wp as we change £. 
(i) The known functional modules: The entire eye lineage 14j is detected as a 



single module by the box-covering algorithm at £ = 11, while the entire neural lineage 



y, u y, q q 



appears at £ = 15. Finer and coarser novel modules are identified by the 
algorithm, (ii) Sub-modules: At £ = 11 the neural lineage is split into the main neural 
and the supporting glial cell modules, while for £ = 7 sub-modules are identified in the 
eye system, (Hi) Super-modules: When we increase the length to £ = 19, the eye and 
neural system form a single super-module. Thus, each cell type is connected to other types 
according to which groups of nodes of all sizes self-organize following a single principle. This 



property allows us to renormalize the network |13[ by replacing each detected module by 
a single supernode to identify the network of modules as shown in Fig. [2b. Following the 
evolution and inter-dependence of these super-modules, as seen in Fig. [2b, identifies families 
of cell types at varying scales. This modularity map is useful in proposing future research 
ties between previously unrelated domains in organ functions. 

The dynamics leading to such a structure can be unraveled by the study of the NHCD 
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as a growth process. The knowledge of the time of appearance of each cell type, T a , makes 
it possible to follow the cumulative growth of the embryo and fetus in terms of the total 
number of cell types at time t, N(t) as well as the number of cell types that eventually 
survive in the organism (Fig. |4h). As expected, surviving cells emerge in the later stages of 
the gestation period. However, the difference between the total and the surviving number 
of cell types indicates that generation of new types of non-surviving cells takes place even 
during the final gestation months. 

The increase of the network size, N(t), is initially approximately exponential and after 
£* = 40 days changes into a slower growth (Fig. Hk). Only a small percentage of the nodes 
grow within a given time interval, so that the network activity is focused in a small number 
of them at a given time. The number of nodes that differentiate at a given time are shown 
in Fig. Hb. We observe an activity that increases monotonically up to around t* = 40 days 
and then drops to lower values. The cross-over time = 40 days observed in Figs. 
and Hb separates two regimes of growth and approximately corresponds to the time below 
which most of the cells have a plastic characteristic (i.e., the capability to differentiate) and 
above which they start to become functional [If. Interestingly, the two regimes observed in 
N(t) merge into a single universal functional curve when we replot N(£ni) as a function 
of the chemical distance to the fertilized egg, £ni (Fig. Hb). This result suggests that the 
topological distance in the network £jvi is the natural variable to characterize the growth 
process in a universal form rather than the time. The dynamic of N(£) follows a typical 
logistic (Verhulst) process of population growth where the rate of growth is restricted by 
environmental limitations: 

dN r N, 

-M =rN ^-W}' < 4 > 



mo = if, -, „ (5) 



with solution, 

exp(r£) 
Nf + (exp(rf) - 1) 

(see the fitting in Fig. Hb) where Nf is the final number of cell types and r = 0.65 is the 
growth rate of cell types. 

Analysis of the network connectivity reveals that the average number of links per node in 
the final stages of the entire NHCD is (k) = 2.24 (Fig. HH). Even though (k) « 2, there is a 
broad degree distribution (scale-free [12j, P(k) ~ &~ 7 , 7 ~ 3.0, Si-Fig. [7]). This implies that 
there is always a small number of crucial cell types that differentiate much more than the 
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others, a fact that agrees with evidence on the existence of a few cells with large plasticity 

potential. As this potential is rapidly lost after 40 days, cell types change their development 

ability in favor of the organism life maintenance. 

The fact that the average degree is close to 2 implies that the dynamical evolution of 

NHCD can be described by a critical branching process where every node has a certain 

probability of generating offsprings, in which case the critical condition for the branching to 
| I 

continue is (k) = 2 [31J. This effectively means that each node needs to give at least one 
descendant in order for the network to keep growing. If (k) < 2, the growth would stop 
early, while for (k) > 2 the growth would be faster than exponential. 

The network reaches the condition of criticality, (k) 2, at around t* = 40 days (Fig. 0J1) 
in conjunction with the transition from plasticity to functional behavior. After this, the 
average degree remains just above criticality to sustain a growth rate that guarantees the 
network survival. The majority of the nodes propagate the growth in a single line, but there 
are nodes which generate significantly more descendants to generate the diversity implied 
by the power-law distributions of degree and modularity. 

II. DISCUSSION 

In summary, we present the first large-scale study of the prenatal evolution of the human 
cell differentiation process from the fertilized egg to a developed human. The process of 
human cell differentiation can be mapped onto a complex network composed of cell types 
and differentiation steps. This mapping allows us to study the cell differentiation process 
with state of the art network theory for community detection with the goal of identifying 
hitherto unknown functional relations between cell types. 

We discover a dynamical law of critical branching explaining the emergence of the network 
topology, which reveals a novel scale-invariant modular structure of the network of cell types. 
The self-similar modular features evidenced in Figs. HJ [2] and [3] are established early in the 
process and remain invariant during the evolution of the NHCD, although the network size 
changes significantly. 

Using this law, we are able to observe the network at different scales. The emerging 
picture clearly identifies clusters of cell types, or modules, and their connectivity to other 
modules within its own and other functions. The resulting hierarchical organization consists 
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of sub-modules, known biological functions and super-modules of specialized tissues and 
organs emerging on varying scales. This discovery is useful in proposing future research ties 
between previously unrelated domains in organ functions in a systematic way Furthermore, 
this information could be of importance in providing predictions of functional attributes to 
a number of identified modules of cell types in the NHCD. 



III. MATERIALS AND METHODS 



Module detection algorithm 

The detection of modules or boxes in our work follows from the application of the box- 



covering algorithm 1_J, |29| at different length scales. The algorithm can be downloaded at 



http://lev.ccny.cuny.edu/~hmakse/soft_data.html. In box covering we assign every 



node to a module, by finding the minimum possible number of boxes, Nb(£), that cover the 
network and whose diameter (defined as the maximum distance between any two nodes in 
this box) is smaller than I. These boxes are characterized by the proximity between all their 
nodes, at a given length scale. Different values of the box diameter t yield boxes of different 
size. These boxes are identified as modules which at a smaller scale I may be separated, but 
merge into larger entities as we increase I. 

In this work we implement the Maximum Excluded Mass Burning (MEMB) algorithm 
from for box covering. The algorithm uses the basic idea of box optimization, where we 
require that each box should cover the maximum possible number of nodes, and works as 
follows: For a given £, we first locate the optimal 'central' nodes which will act as the origins 
for the boxes. This is done by first calculating the number of nodes (called the mass) within 
a diameter i from each node. The node that yields the largest mass is marked as a center. 
Then we mark all the nodes in the box of this center node as 'tagged'. We repeat the process 
of calculating the mass of the boxes starting from all non-center nodes, and we identify a 
second center according to the largest remaining mass, while nodes in the corresponding box 
are 'tagged', and so on. When all nodes are either centers or 'tagged' we have identified the 
minimum number of centers that can cover the network at the given I value. Starting from 
these centers as box origins, we then simultaneously burn the boxes from each origin until 
the entire network is covered, i.e. each node is assigned to one box (we call this process 
burning since it is similar to burning algorithms developed to investigate clustering statistics 
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in percolation theory 12j). In Fig. [2^ of the main text we show how box-covering works for 
a simple network at two different £ values. 

This algorithm is driven by the proximity between nodes and the maximization of the 



mass associated with each module center 
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29]. Thus it detects boxes that maximize 



modularity, Eq. ([T]). In the case of MEMB we have the additional benefit of detecting 
modules at different scales, so that we can study the hierarchical character of modularity, 
i.e. modules of modules, and we can detect whether modularity is a feature of the network 
that remains scale- invariant. 

The fractal dimension (ig of a complex network is an exponent that determines how the 
mass (equivalently: the number of nodes) around any given node scales with the length, 
which in networks corresponds to the shortest distance between two nodes. In order to 
numerically measure this exponent we optimally cover the network with boxes using the 
MEMB algorithm. A box is a set of nodes where all distances t^j between any two nodes i 
and j in this set are smaller than a given value of £, the box size. Although there is a large 
number of coverings, for every value of £ we want to find the one which gives the smallest 
possible number of boxes, Nb{£)- Varying £ then yields the scaling relation Eq. (|3]). A finite 
fractal dimension reveals fundamental organizational principles of the underlying network, 
namely a self-similar structural character, where the network is built in a similar way even 
though we observe it at different length-scales. The boxes that are identified through this 
process correspond to the modules at varying scales. 
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Fig. [0 Complex network representation of the human cell differentiation 
process. The first steps of the NHCD construction are shown in the inset of this figure. 
These steps, known to also be present in the formation of the majority of multicellular 
organisms, include the first cleavage of a fertilized egg, which is subsequently followed by the 
ball stage and the formation of primary germ cell layers, namely, the ectoderm, mesoderm, 
and endoderm. The fertilized egg is a totipotent stem cell. The blastocyst, in turn, gives 
rise to both trophoblast and inner cell mass. These two cells further differentiate into other 
types of cells, and so on. Following the above process until the fetus is fully developed 
yields the complex network shown in this figure. Each node, plotted as a circle, corresponds 
to a cell type and the edges to a differentiation step. The entire network originates from 
the fertilized egg (denoted by a red square) and leads to the specialized cells of a developed 
human. Filled circles correspond to nodes that survive at the end of the development process, 
while empty circles correspond to non-surviving cell types. Nodes in communities of known 
functions from the literature are indicated by different colors, except for those cell types 
with no functional annotation (see SI- Table [I] for association to the known functions CI to 
C19 extracted from the literature). 

Fig. [2l Detection of modules and the network of modules at different scales, 
a, Demonstration of the box-covering algorithm for a schematic network, following the 



Maximum Excluded Mass Burning algorithm in [13|, [29] (see Si-Section IIIII for full details). 
We cover the network with the smallest possible number of boxes for a given £ value. This 
is done in a two-stage process: (i) We detect the smallest possible number of box origins 
(shown with cyan color) that provide the maximum number of nodes (mass) in each box, 
according to the following optimization algorithm: We calculate the mass associated with 
each node, and pick the first center as the node with largest mass and mark the nodes 
in this box as 'tagged'. We repeat the process from the remaining non-center nodes to 
identify a second center with the highest mass, and so on. (ii) We build the boxes through 
simultaneous burning from these center nodes, until the entire network is covered with boxes. 
For example, at I = 3 there are four boxes, where the maximum distance between any two 
nodes in a box is smaller than £. Similarly, we can cover the same network with two boxes 
at I — 6. These two boxes are the result of merging two of the four boxes at £ = 3. b, 
Detail of NHCD modules detected by the above box-covering algorithm for two particular 
functions. The algorithm detects a hierarchy of sub-modules, known functions and super- 
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modules of size £ plotted in different colors. We show the identified modules corresponding 
to CT2-neural system and CT3-eye system (full structure is in Fig. [2b and Si-Fig. [6]), which 
first appear at £ = 15 and £ = 11, respectively. At other scales the box-covering algorithm 
detects new functional relations between cell types expressed in the obtained sub and super- 
modules. For instance, at £ = 11 the neural lineage is further divided into two sub- modules, 
while at £ = 19 the two functions merge into a super-module, c, The network of modules at 
different £ values, as detected through the box-covering algorithm. Every node corresponds 
to one of the three following types, in terms of increasing scale: (i) Sub- modules (small 
grey dots), which are fractions of a fully functional module, (ii) Known functional biological 
modules (colored circles), whose color corresponds to the functions CT-CT9, and (in) Super- 
modules (pie-charts), which represent the union of more than one known functional module, 
described by the colors of the pie-chart. The links that stem from known functional modules 
and super-modules are shown in red, and they progressively span the entire network as we 
increase £. 

Fig. [31 Modular properties of the NHCD. a, Degree of modularity of the network, 
Ai {£) at different times T a (indicated in the figure) as a function of the scale of observation, 
£. b, Number of boxes/modules, Nb, versus the size of the modules £ identified by the 
box-covering algorithm for different networks at time T a . 

Fig. [4]. Growth properties of the NHCD. a, Number of cell types in the network, 
N(t), as a function of time. We find precise information about the appearance time T a for 
782 among the 873 cell types. Those cells with missing appearance time have not been taken 
into account in this plot. Also shown are the time evolution of the number of surviving and 
non-surviving cells, b, Number of nodes whose degree increases at time t (red histogram) 
and number of new links appearing in the network (blue histogram) as a function of time. 
If all nodes were giving just one child then the two histograms would coincide. Inset: The 
average number of new links per node at a given time can be found by dividing the two 
histograms in the main plot. This plot shows how intense is the activity at that particular 
time. Despite the variation in activity, the new connections average around 1, which gives 
a critical branching ratio of (k) ~ 2. c, Number of cell types versus the chemical distance 
to the first node, ijsri- This distance is only determined by the connections between the cell 
types, and is not influenced by the appearance time, so that we include all 873 cell types, d, 
Average degree (k) of the network as a function of time showing that the network achieves 
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the condition of critical branching process (k) ^ 2 at around t* = 40 days. 
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FIG. 1: 
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Fig. 2b 
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FIG. 4: 
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SUPPLEMENTARY INFORMATION 



Modularity map of the network of human cell differentiation 

IV. ADDITIONAL INFORMATION FOR NHCD 

FIGURE [5] provides an alternative representation of the NHCD. Different cell types 
appearing as bifurcation points of the network are stacked along the vertical axis. In Si-Fig. 
|5k the horizontal axis corresponds to the shortest path £^\ calculated from node Nl, the 
fertilized egg, to any given node, while in Si-Fig. [5b the same network is shown as a function 
of the node appearance time T a . The links emerging from each cell type follow a bifurcation 
pattern ending up at the right side with k — 1 branches, each one of them representing one 
of the more specialized cell types. Red nodes correspond to the surviving cell types, and 
they preferentially appear at later times, while the non-surviving cell types, the blue nodes, 
emerge during the early stages of the process. The color of the edges corresponds to one of 
the 19 functional groups identified in Table [J as given in Fig. [TJ Links that generate loops 
are plotted in red. 

Figure [5b contains the same information as Fig. [5^ but we plot each cell type according 
to its time of appearance rather than as a function of the chemical distance to Nl, as in 
Fig. Eh. The white and yellow alternating vertical stripes divide the time axis in intervals in 
days. The branches have been extended so that each cell appears only in the corresponding 
interval. Colors and labels are the same as in Fig. [5k. 

The catalog presented in Ref. |6j reports 407 distinct cell types in a healthy adult human 
body, all of which can be identified in our network representation. Most of them occupy 
the end points of the 529 branches in Figs. [1] and |5j Therefore, not all tree leaves (branch 
endpoints) correspond to cell types in born humans. 

The average shortest path calculated from all cell types to Nl is (£ni) = 10.93, as 
expected from the large concentration of links in the interval 8 < £ < 13 (Fig. [5^,). 

FIGURE [6] shows the full modular structure of the NHCD as detected by the box- 
covering algorithm at different length scales. A detail of this process is represented in Fig. 
[5] in the main text. A list containing the nodes belonging to each module at a given £ is 
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contained in the file modules.txt. 



FIGURE \7\ shows the degree distribution of the NHCD for different times. 



Table [U lists the different known functional modules of the NHCD and the respective 
citations to the literature. The complete collected data is listed in the datafile: links.txt. 
This file includes the links between the cell types, their time of appearance in days after 
fecundation (T a ), and the known functional module they belong to. The references to the 
publications reporting each link appear in the table. Data on the structure of individual 
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FIG. 5: Alternative representation of the NHCD. (a) The horizontal axis measures the 
shortest path from each cell type to the fertilized egg along the network, i^. (b) The horizontal 
axis denotes the appearance time of a given cell type, T a . For simplicity we do not plot the links 
leading to loops. The colors of the branches denote the functional classes as in Fig. [TJ 
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FIG. 6: Full modular structure of the NHCD at the indicated length £, as detected by 
the box-covering algorithm. Each node is depicted with a different color indicating the module 
to which it belongs to. 
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TABLE I: Identification of the distinct biological functions of the cell types indicated in Fig. [T]and 
Si-Fig. [5j The third column lists the references used for building the NHCD. 



Label 


Biological function 


Reference 


1 


Germinative Lineage 
and Placenta 


Alberts et al, 2002; Kirschstein and Skirboll, 2001; 
Sadler, 2004; Sell, 2004 


2 


Skeletal System 


Bianco et al., 2001; Freitas, 1999; Mochida, 2005; Sadler, 2004; 
Sell, 2004; Sanders et al., 1999; Towler and Gelberman, 2006; 
Vickaryous and Hall, 2006 


3 


Skeletal Muscular System 


Chen and Goldhamer, 2002; Sadler, 2004; Sell, 2004; 
Vickaryous and Hall, 2006 


4 


Hematopoietic Lineage 


Alberts et al., 2002; Kirschstein and Skirboll, 2001; 

Janeway et al., 2001; Minasi et al, 2002; Paxinos and Mai, 2004; 

Sadler, 2004; Sell, 2004; Vickaryous and Hall, 2006 


5 


Urogenital System 


Anglani et al., 2004; Coulter, 2004; Horster et al., 1999; Lopez 
et al., 2001; Sadler, 2004; Sell, 2004; Vickaryous and Hall, 2006 


6 


Cardiovascular System 


Sadler, 2004; Sell, 2004; Vickaryous and Hall, 2006 


7 


Mesothelial Lineage 


Herrick and Mutsaers, 2004; Sadler, 2004 


8 


Respiratory System 


Freitas, 1999; Otto, 2002; Sadler, 2004; Sell, 2004 


9 


Digestive System 


Bardeesy and DePinho, 2002; Fausto, 2004; Freitas, 1999; 
Sadler, 2004; Sell, 2004; Vickaryous and Hall, 2006 


10 


Pharyngeal Lineage 


Blackburn and Manley, 2004; Freitas, 1999; 
Sadler, 2004; Vickaryous and Hall, 2006 


11 


Cloacal Lineage 


Foster et al, 2002; Freitas, 1999; Sadler, 2004; 
Sell, 2004; Vickaryous and Hall, 2006 


12 


Neural Lineage 


Freitas, 1999; Kirschstein and Skirboll, 2001; Paxinos and Mai, 2004; 
Sadler, 2004; Sell, 2004; Temple, 2001; Vickaryous and Hall, 2006 


13 


Eye Lineage 


Paxinos and Mai, 2004; Sadler, 2004; Sell, 2004; 

V lUJ^.clI _y U Via cLLUJL J-LcLll. ZiUUU 


14 


Neural Crest Lineage 


Jessen and Mirsky, 2005; Nakashima and Redid, 2003; 

Sadler, 2004; Santagati and Rijli, 2003; Sell, Szeder et al., 2003; 

Vickaryous and Hall, 2006 


15 


Adenohypophysis 


Paxinos and ISai, 2004; Sadler, 2004; Savage et al, 2003; 
Vickaryous and Hall, 2006 



16 


Primitive Oral Cavity 


Freitas, 1999; Nakashima and Redid, 2003; Sadler, 2004; 
Vickaryous and Hall, 2006 


17 


Ear 


Forge and Wright, 2002; Freitas, 1999; Paxinos and Mai, 2004; 
Sadler, 2004; Vickaryous and Hall, 2006 


18 


Nose 


Freitas, 1999; Sadler, 2004; Vickaryous and Hall, 2006 


19 


Integumentary System 


Freitas, 1999; Hennighausen and Robinson, 2005; 

Panteleyev et al., 2001; Potten and Booth, 2002; Sadler, 2004; 

Stoeckelhuber et al., 2003; Vickaryous and Hall, 2006 
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